Class 31

DATA1220-55, Fall 2024

Sarah E. Grabinski

2024-11-20

Review: 2 Numeric Variables

  • Typically visualized using a scatter plot

  • Explanatory/independent/predictor variable on the \(X\) axis

  • Response/dependent/outcome variable on the \(Y\) axis

Review: Describing Associations

  • Independence: an increase in \(X\) is not associated with a change in \(Y\)
  • Positive association: an increase in \(X\) is associated with an increase in \(Y\)
  • Negative association: an increase in \(X\) is associated with a decrease in \(Y\)
  • Weak association: data points are very far apart from each other
  • Strong association: data points are tightly clustered

Pratice

Which image shows a positive relationship between the explanatory and response variables?

Income vs Education

Age vs Survival

Practice

Which image shows a strong relationship between the explanatory and response variables?

Correlation

  • Describes the direction and strength of the association between 2 numeric variables
  • A correlation ranges from -1 to 1

    • A perfect negative correlation equals -1

    • A perfect positive correlation equals 1

  • A correlation of 0 indicates the two variables are independent (no relationship)
  • Different techniques for linear (Pearson) vs non-linear (Spearman) relationships

Linear vs Non-Linear

Interpreting Correlations

High Low High Low Perfect Perfect 1 0.9 0.5 0 -0.5 -0.9 -1 Positive Positive Negative Negative No Positive Negative Correlation Correlation Correlation Correlation Correlation Correlation Correlation

Example: Poverty vs Graduation Rate

What’s the response variable?

Response Variable: Percent of people in poverty

Example: Poverty vs Graduation Rate

What’s the explanatory variable?

Explanatory variable: Percent of people who graduated high school

Example: Poverty vs Gradution Rate

Describe the relationship between these 2 variables.

Correlation: strong, negative

Example: Poverty vs Graduation Rate

Which of the following is the most likely correlation? A. 0.60 B. -0.25 C. -0.75 D. 0.35

Describe the relationship between these 2 variables.

Example: Poverty vs Graduation Rate

Which of the following is the most likely correlation? C. -0.75

Describe the relationship between these 2 variables.

Example: Correlation vs Causation

Do ice cream sales cause drowning incidents?

Example: Correlation vs Causation

  • As ice cream sales increase, the number of drownings increases

. . .

  • Strong, positive correlation

. . .

  • High temperatures increase both ice cream consumption and the number of people swimming